Allelic series of genomic modifications in cells

ABSTRACT

The present invention relates to methods of producing an allelic series of modifications in genes of interest in a cell. In particular, the invention provides methods for using nucleic acid sequence-modifying agents (e.g., chemicals, electromagnetic radiation, etc.) to introduce modifications in any nucleic acid sequence in the genome of a cell. Also provided are sets of cells which contain at least one modification in any gene of interest. The methods and compositions of the invention are useful in determining the function of the gene of interest.

FIELD OF THE INVENTION

The present invention relates to methods of producing modifications ingenes of interest in a cell. In particular, the invention providesmethods for using nucleic acid sequence-modifying agents to introducemodifications in any gene of interest in the genome of a cell. Alsoprovided are sets of cells which contain at least one modification inany gene of interest. The methods and compositions of the invention areuseful in determining the function of the gene of interest.

BACKGROUND OF THE INVENTION

With the completion of the Human Genome Program approaching, there is anincreasing interest in studying the function of genes, particularlythose involved in human development and disease. While mapping andnucleotide sequencing of genes is an important first step forunderstanding the function of genes, the physical characterization ofthe structure of a gene does not provide insight into the function ofthat gene in the context of a multicellular organism.

For example, prior art approaches to determining gene function inmammals have relied on targeting mutations to specific genes inembryonic stem (ES) cells, or on genome-wide mutagenesis techniquesdesigned to mutate all genes of an organism (e.g., mice). For example,“knock-out” mutations in ES cells have been widely used to targetmutations to specific genes. “Knock-out” mutations shut off or altergene expression and are currently used to produce a phenotype in thewhole animal which reflects the function of the knocked-out gene. Thisapproach has identified many genes which are associated with cancer andother human genetic diseases, and relies either on phenotype-basedscreens (i.e., screening for a particular phenotype) or on gene-basedscreens (i.e., screening for a particular alteration in the genome).

Phenotype-based screens have primarily been conducted using mice, andinvolve characterization of thousands of mutagenized mice for specificdiseases and traits [Russell et al., Proc. Natl. Acad. Sci. USA76:5818-5819, 1979; Hitotsumachi et al., Proc. Natl. Acad. Sci. USA82:6619-6621; Shedlovsky et al., Genetics 134:1205-1210; Marker et al.,Genetics 145:435-443, 1997]. While the phenotype-based approach has theadvantage that no assumption is made with respect to which genes areassociated with a given disease or disorder, it is nevertheless verycostly when using organisms such as mice since it requires themaintenance of several lines of mutagenized whole organisms.Furthermore, it is unclear whether phenotype-based screens permitconducting saturation screens for both dominant and recessive mutationsof all mouse genes.

Gene-based screens have been carried out in whole animals and inembryonic stem (ES) cells. This approach involves identifying theorganism's genes or the ES cell genes which have been mutated.Homologous recombination and retroviral insertion are commonly used inES cells [Zambrowicz et al. (1998) Nature 392:608-611]. Althoughmutagenesis by homologous recombination is becoming routine, it remainscumbersome and expensive. Similarly, while the genome-wide approach tomutagenizing ES cells by retroviral insertional mutagenesis allows thegeneration of a large number of mutagenized ES cells in a cost effectivemanner, this approach produces only one, or a limited number of, allelesof a given gene. Additionally, the class of mutations that can beproduced with this approach is limited to those mutations which resultfrom integration of a retroviral element. Thus, mutations caused by, forexample, single amino acid changes in the protein cannot be producedusing this approach. In many instances, for example, it may be desirableto generate mutations which cause single amino acid changes that merelymodify gene function (e.g., by generating hypomorphic alleles thatexpress the gene with a reduced efficiency) or that give rise to a newtrait in the animal (e.g., by generating dominant neomorphic alleleswhich result in a gain of function). The generation of hypomorphic andneomorphic alleles of a gene in a model organism by single amino acidsubstitutions may be desirable to create a model organism for a humantrait or disease in which gene function is modified rather thandestroyed.

Accordingly, what is needed are methods for determining gene functionwhich may efficiently be applied on a genome-wide scale, which generatemore than one mutation in a gene of interest, and which do not onlyabrogate the function of the gene.

SUMMARY OF THE INVENTION

The invention provides methods for generating an allelic series ofmodifications in any gene of interest contained in a cell using nucleicacid sequence-modifying agents. In particular, the invention provides amethod of producing a modification in a gene of interest contained in acell, comprising: a) providing: i) a plurality of target cells capableof being cultured; ii) an agent capable of producing at least onemodification in the gene of interest in the target cell; b) treating thetarget cells with the agent under conditions such that a mixture ofcells is produced, the mixture of cells comprising cells having anunmodified gene of interest and cells having a modified gene ofinterest; and c) isolating the cells having a modified gene of interest.

In one preferred embodiment, the methods of the invention furthercomprise step d) comparing the nucleotide sequence of the gene ofinterest in the cells having a modified gene of interest with thenucleotide sequence of the gene of interest in the cells having anunmodified gene of interest. In a more preferred embodiment, the methodsfurther comprise e) manipulating the cells having a modified gene ofinterest to generate an organism comprising the modification in the geneof interest. In an alternative more preferred embodiment, the methodfurther comprises prior to step d) amplifying the modified gene ofinterest to produce an amplified modified gene of interest. In yet amore preferred embodiment, the method further comprises prior to step d)sequencing the amplified modified gene of interest.

Without intending to limit the methods of the invention to anyparticular modification, in one embodiment, the modification is selectedfrom the group consisting of mutation, mismatch, and strand break. In apreferred embodiment, the mutation is selected from the group consistingof deletion, insertion and substitution. In another preferredembodiment, the strand break is selected from the group consisting ofsingle-strand break and double-strand break.

While it is not intended that the scope of the invention be limited toany particular type or source of target cell, in one embodiment, thetarget cell is derived from an organism selected from the groupconsisting of non-human animal, plant, protist, fungus, bacterium, andvirus. In a preferred embodiment, the non-human animal is a mammal. In amore preferred embodiment, the mammal is a mouse. In an alternativepreferred embodiment the non-human animal is zebrafish. In anotherembodiment, the target cell is an embryonic stem cell.

The invention is not intended to be limited to any particular type orclass of agent capable of producing at least one modification in thegene of interest. However, in one preferred embodiment, the agent isselected from the group consisting of N-ethyl-N-nitrosurea,methylnitrosourea, procarbazine hydrochloride, triethylene melamine,acrylamide monomer, chlorambucil, melphalan, cyclophosphamide, diethylsulfate, ethyl methane sulfonate, methyl methane sulfonate,6-mercaptopurine, mitomycin-C, procarbazine,N-methyl-N′-nitro-N-nitrosoguanidine, ³H₂O, urethane, ultraviolet light,X-ray radiation, and gamma-radiation.

The invention further provides a method of producing an allelic seriesof modification in a gene of interest contained in a cell, comprising:a) providing: i) a plurality of target cells capable of being cultured;ii) an agent capable of producing at least one modification in the geneof interest in the target cell; b) treating the target cells with theagent under conditions such that a mixture of cells is produced, themixture of cells comprising cells having an unmodified gene of interest,cells having a first modification in the gene of interest, and cellshaving a second modification in the gene of interest; and c) isolatingthe cells having a first modification in the gene of interest and thecells having a second modification in the gene of interest, therebyproducing an allelic series of modification in the gene of interest.

In one preferred embodiment, the method further comprises step d)comparing the nucleotide sequence of the gene of interest in the cellshaving an unmodified gene of interest with the nucleotide sequence ofthe gene of interest in cells selected from the group consisting of thecells having a first modification in the gene of interest and the cellshaving a second modification in the gene of interest. In a morepreferred embodiment, the method further comprises e) manipulating cellsselected from the group consisting of the cells having a firstmodification in the gene of interest and the cells having a secondmodification in the gene of interest to generate an organism comprisinga modification selected from the group consisting of the firstmodification in the gene of interest and the second modification in thegene of interest. In an alternative more preferred embodiment, themethod further comprises prior to step d) amplifying the gene ofinterest selected from the group consisting of the gene of interesthaving the first modification and the gene of interest having the secondmodification to produce amplified modified gene of interest selectedfrom the group consisting of amplified gene of interest having the firstmodification and amplified gene of interest having the secondmodification. In yet a more preferred embodiment, the method furthercomprises prior to step d) sequencing the amplified modified gene ofinterest.

Without limiting the invention to any particular class or type ofmodification, in an alternative preferred embodiment, the firstmodification and the second modification are selected from the groupconsisting of mutation, mismatch, and strand break. In a more preferredembodiment, the mutation is selected from the group consisting ofdeletion, insertion and substitution. In an alternative preferredembodiment, the strand break is selected from the group consisting ofsingle-strand break and double-strand break.

The invention is not limited to any particular type or source of targetcell. However, in one preferred embodiment, the target cell is derivedfrom an organism selected from the group consisting of non-human animal,plant, protist, fungus, bacterium, and virus. In a more preferredembodiment, the non-human animal is a mammal. In yet a more preferredembodiment, the mammal is a mouse. In an alternative preferredembodiment, the non-human animal is zebrafish. In another preferredembodiment, the target cell is an embryonic stem cell.

Without intending to limit the methods of the invention to anyparticular class or type of agent capable of producing at least onemodification in the gene of interest, in one preferred embodiment, theagent is selected from the group consisting of N-ethyl-N-nitrosurea,methylnitrosourea, procarbazine hydrochloride, triethylene melamine,acrylamide monomer, chlorambucil, melphalan, cyclophosphamide, diethylsulfate, ethyl methane sulfonate, methyl methane sulfonate,6-mercaptopurine, mitomycin-C, procarbazine,N-methyl-N-nitro-N-nitrosoguanidine, ³H₂O, urethane, ultraviolet light,X-ray radiation, and gamma-radiation.

The invention further provides a method of producing a modification in agene of interest contained in a cell, comprising: a) providing: i) aplurality of target cells capable of being cultured; ii) an agentcapable of producing at least one modification in the gene of interestin the target cell; b) treating the target cells with the agent underconditions such that a mixture of cells is produced, the mixture ofcells comprising a cell having an unmodified gene of interest and two ormore cells having a modified gene of interest, the two or more cellshaving different modifications in the gene of interest; and c) isolatingthe two or more cells having a modified gene of interest.

In one embodiment, the method further comprises step d) comparing thenucleotide sequence of the gene of interest in the cells having amodified gene of interest with the nucleotide sequence of the gene ofinterest in the cells having an unmodified gene of interest. In a morepreferred embodiment, the method further comprises e) manipulating thecells having a modified gene of interest to generate an organismcomprising the modification in the gene of interest. In an alternativepreferred embodiment, the method further comprises prior to step d)amplifying the modified gene of interest to produce an amplifiedmodified gene of interest. In yet a more preferred embodiment, themethod further comprises prior to step d) sequencing the amplifiedmodified gene of interest.

While not intending to limit the invention to any particular type ofmodification, in one embodiment, the modification is selected from thegroup consisting of mutation, mismatch, and strand break. In a preferredembodiment, the mutation is selected from the group consisting ofdeletion, insertion and substitution. In an alternative preferredembodiment, the strand break is selected from the group consisting ofsingle-strand break and double-strand break.

The invention is not limited to any particular type or source of targetcell. However, in one embodiment, the target cell is derived from anorganism selected from the group consisting of non-human animal, plant,protist, fungus, bacterium, and virus. In a preferred embodiment, thenon-human animal is a mammal. In a more preferred embodiment, the mammalis a mouse. In an alternative preferred embodiment, the non-human animalis zebrafish. In another embodiment, target cell is an embryonic stemcell.

It is not intended that the invention be limited to the type or class ofagent capable of producing at least one modification in the gene ofinterest. However, in one embodiment, the agent is selected from thegroup consisting of N-ethyl-N-nitrosurea, methylnitrosourea,procarbazine hydrochloride, triethylene melamine, acrylamide monomer,chlorambucil, melphalan, cyclophosphamide, diethyl sulfate, ethylmethane sulfonate, methyl methane sulfonate, 6-mercaptopurine,mitomycin-C, procarbazine, N-methyl-N-nitro-N-nitrosoguanidine, ³H₂O,urethane, ultraviolet light, X-ray radiation, and gamma-radiation.

Definitions

To facilitate understanding of the invention, a number of terms aredefined below.

The term “genomic sequence” refers to any deoxyribonucleic acid sequencelocated in a cell. Genomic sequences include, but are not limited to,structural genes, regulatory genes, and regulatory elements.

A “transgenic organism” as used herein refers to an organism whose germline cells have been altered by the introduction of a transgene. Theterm “transgene” as used herein refers to any nucleic acid sequencewhich is introduced into the genome of an organism by experimentalmanipulations. A transgene may be an “endogenous DNA sequence,” or a“heterologous DNA sequence” (i.e., “foreign DNA”). The term “endogenousDNA sequence” refers to a nucleotide sequence which is naturally foundin the cell into which it is introduced so long as it does not containsome modification (e.g., a point mutation, the presence of a selectablemarker gene, etc.) relative to the naturally-occurring sequence. Theterms “heterologous DNA sequence” and “foreign DNA sequence” are usedinterchangeably herein to refer to a nucleotide sequence which isligated to, or is manipulated to become ligated to, a nucleic acidsequence to which it is not ligated in nature, or to which it is ligatedat a different location in nature. Heterologous DNA is not endogenous tothe cell into which it is introduced, but has been obtained from anothercell. Heterologous DNA also includes an endogenous DNA sequence whichcontains some modification relative to the endogenous DNA sequence.Generally, although not necessarily, heterologous DNA encodes RNA andproteins that are not normally produced by the cell into which it isexpressed. Examples of heterologous DNA include reporter genes,transcriptional and translational regulatory sequences, selectablemarker proteins. (e.g., proteins which confer drug resistance), etc.

As used herein, the term “gene” means the deoxyribonucleotide sequencescomprising the coding region of a structural gene and includingsequences located adjacent to the coding region on both the 5′ and 3′ends for a distance of several KB on either end such that the genecorresponds to the length of the full-length mRNA. The sequences whichare located 5′ of the coding region and which are present on the mRNAare referred to as 5′ non-translated sequences. The sequences which arelocated 3′ or downstream of the coding region and which are present onthe mRNA are referred to as 3′ non-translated sequences. A genomic formor clone of a gene contains coding sequences, termed exons, alternatingwith non-coding sequences termed “introns” or “intervening regions” or“intervening sequences.” Introns are segments of a gene which aretranscribed into heterogenous nuclear RNA (hnRNA); introns may containregulatory elements such as enhancers. Introns are removed or “splicedout” from the nuclear or primary transcript; introns therefore areabsent in the messenger RNA (mRNA) transcript. In addition to containingintrons, genomic forms of a gene may also include sequences located onboth the 5′ and 3′ end of the sequences which are present on the RNAtranscript. These sequences are referred to as “flanking” sequences orregions (these flanking sequences are located 5′ or 3′ to thenon-translated sequences present on the mRNA transcript). The 5′flanking region may contain regulatory sequences such as promoters andenhancers which control or influence the transcription of the gene. The3′ flanking region may contain sequences which direct the termination oftranscription, posttranscriptional cleavage and polyadenylation.

As used herein the term “coding region” when used in reference to astructural gene refers to the nucleotide sequences which encode theamino acids found in the nascent polypeptide as a result of translationof an mRNA molecule. The coding region is bounded, in eukaryotes, on the5′ side by the nucleotide triplet “ATG” which encodes the initiatormethionine and on the 3′ side by one of the three triplets which specifystop codons (i.e., TAA, TAG, TGA).

As used herein, the term “structural gene” refers to a DNA sequencecoding for RNA or a protein. In contrast, “regulatory genes” arestructural genes which encode products (e.g., transcription factors)which control the expression of other genes.

As used herein, the term “regulatory element” refers to a geneticelement which controls some aspect of the expression of nucleic acidsequences. For example, a promoter is a regulatory element whichfacilitates the initiation of transcription of an operably linked codingregion. Other regulatory elements are splicing signals, polyadenylationsignals, termination signals, enhancer elements, etc. Promoters andenhancers consist of short arrays of DNA sequences that interactspecifically with cellular proteins involved in transcription [Maniatis,et al., Science 236:1237 (1987)]. Promoter and enhancer elements havebeen isolated from a variety of eukaryotic sources including genes inyeast, insect and mammalian cells and viruses (analogous controlelements, i.e., promoters, are also found in prokaryotes). The selectionof a particular promoter and enhancer depends on what cell type is to beused to express the protein of interest. Some eukaryotic promoters andenhancers have a broad host range while others are functional in alimited subset of cell types [for review see Voss, et al., TrendsBiochem. Sci., 11:287 (1986) and Maniatis, et al., Science 236:1237(1987)]. For example, the SV40 early gene enhancer is very active in awide variety of cell types from many mammalian species and has beenwidely used for the expression of proteins in mammalian cells [Dijkema,et al., EMBO J. 4:761 (1985)]. Other examples of promoter/enhancerelements active in a broad range of mammalian cell types are those fromthe human elongation factor 1α gene [Uetsuki et al., J. Biol. Chem.,264:5791 (1989); Kim et al., Gene 91:217 (1990); and Mizushima andNagata, Nuc. Acids. Res., 18:5322 (1990)] and the long terminal repeatsof the Rous sarcoma virus [Gorman et al, Proc. Natl. Acad. Sci. USA79:6777 (1982)] and the human cytomegalovirus [Boshart et al., Cell41:521 (1985)].

The terms “gene of interest” and “nucleotide sequence of interest” referto any gene or nucleotide sequence, respectively, the manipulation ofwhich may be deemed desirable for any reason by one of ordinary skill inthe art.

The term “expression vector” as used herein refers to a recombinant DNAmolecule containing a desired coding sequence and appropriate nucleicacid sequences necessary for the expression of the operably linkedcoding sequence in a particular host organism. Nucleic acid sequencesnecessary for expression in prokaryotes include a promoter, optionallyan operator sequence, a ribosome binding site and possibly othersequences. Eukaryotic cells are known to utilize promoters, enhancers,and termination and polyadenylation signals.

A “modification” as used herein in reference to a nucleic acid sequencerefers to any change in the structure of the nucleic acid sequence.Changes in the structure of a nucleic acid sequence include changes inthe covalent and non-covalent bonds in the nucleic acid sequence.Illustrative of these changes are mutations, mismatches, strand breaks,as well as covalent and non-covalent interactions between a nucleic acidsequence (which contains unmodified and/or modified nucleic acids) andother molecules. Illustrative of a covalent interaction between anucleic acid sequence and another molecule are changes to a nucleotidebase (e.g., formation of thymine glycol) and covalent cross-linksbetween double-stranded DNA sequences which are introduced by, forexample, ultraviolet radiation or by cis-platinum. Yet another exampleof a covalent interaction between a nucleic acid sequence and anothermolecule includes covalent binding of two nucleic acid sequences topsoralen following ultraviolet irradiation. Non-covalent interactionsbetween a nucleic acid sequence and another molecule includenon-covalent interactions of a nucleic acid sequence with a moleculeother than a nucleic acid sequence and other than a polypeptidesequence. Non-covalent interactions between a nucleic acid sequence witha molecule other than a nucleic acid sequence and other than apolypeptide sequence are illustrated by non-covalent intercalation ofethidium bromide or of psoralen between the two strands of adouble-stranded deoxyribnucleic acid sequence. The present inventioncontemplates modifications which cause changes in a functional property(or properties), such changes manifesting themselves at a variety oftime points.

The term “allelic series” when made in reference to a gene refers towild-type sequences of the gene. An “allelic series of modifications” asused herein in reference to a gene refers to two or more nucleic acidsequences of the gene, where each of the two or more nucleic acidsequences of the gene contains at least one modification when comparedto the wild-type sequences of the gene.

As used herein, the term “mutation” refers to a deletion, insertion, orsubstitution. A “deletion” is defined as a change in a nucleic acidsequence in which one or more nucleotides is absent. An “insertion” or“addition” is that change in a nucleic acid sequence which has resultedin the addition of one or more nucleotides. A “substitution” resultsfrom the replacement of one or more nucleotides by a molecule which is adifferent molecule from the replaced one or more nucleotides. Forexample, a nucleic acid may be replaced by a different nucleic acid asexemplified by replacement of a thymine by a cytosine, adenine, guanine,or uridine. Alternatively, a nucleic acid may be replaced by a modifiednucleic acid as exemplified by replacement of a thymine by thymineglycol.

The term “mismatch” refers to a non-covalent interaction between twonucleic acids, each nucleic acid residing on a different polynucleicacid sequence, which does not follow the base-pairing rules. Forexample, for the partially complementary sequences 5′-AGT-3′ and5′-AAT-3′, a G-A mismatch is present.

The term “strand break” when made in reference to a double strandednucleic acid sequence includes a single-strand break and/or adouble-strand break. A single-strand break refers to an interruption inone of the two strands of the double stranded nucleic acid sequence.This is in contrast to a double-strand break which refers to aninterruption in both strands of the double stranded nucleic acidsequence. Strand breaks may be introduced into a double stranded nucleicacid sequence either directly (e.g., by ionizing radiation) orindirectly (e.g., by enzymatic incision at a nucleic acid base).

The terms “nucleic acid” and “unmodified nucleic acid” as used hereinrefer to any one of the known four deoxyribonucleic acid bases (i.e.,guanine, adenine, cytosine, and thymine). The term “modified nucleicacid” refers to a nucleic acid whose structure is altered relative tothe structure of the unmodified nucleic acid. Illustrative of suchmodifications would be replacement covalent modifications of the bases,such as alkylation of amino and ring nitrogens as well as saturation ofdouble bonds.

The term “modified cell” refers to a cell which contains at least onemodification in the cell's genomic sequence.

The term “nucleic acid sequence-modifying agent” refers to an agentwhich is capable of introducing at least one modification into a nucleicacid sequence. Nucleic acid sequence-modifying agents include, but arenot limited to, chemical compounds [e.g., N-ethyl-N-nitrosurea (ENU),methylnitrosourea (MNU), procarbazine hydrochloride (PRC), triethylenemelamine (TEM), acrylamide monomer (AA), chlorambucil (CHL), melphalan(MLP), cyclophosphamide (CPP), diethyl sulfate (DES), ethyl methanesulfonate (EMS), methyl methane sulfonate (MMS), 6-mercaptopurine (6MP),mitomycin-C (MMC), procarbazine (PRC),N-methyl-N′-nitro-N-nitrosoguanidine (MNNG), ³H₂O, and urethane (UR)],and electromagnetic radiation [e.g., X-ray radiation, gamma-radiation,ultraviolet light].

The term “wild-type” when made in reference to a gene refers to a genewhich has the characteristics of that gene when isolated from anaturally occurring source. A wild-type gene is that which is mostfrequently observed in a population and is thus arbitrarily designatedthe “normal” or “wild-type” form of the gene. In contrast, the term“modified” or “mutant” refers to a gene or gene product which displaysmodifications in sequence and/or functional properties (i.e., alteredcharacteristics) when compared to the wild-type gene or gene product. Itis noted that naturally-occurring mutants can be isolated; these areidentified by the fact that they have altered characteristics whencompared to the wild-type gene or gene product.

DESCRIPTION OF THE INVENTION

The invention provides methods for efficiently generating an allelicseries of modifications in any gene of interest contained in a cellusing nucleic acid sequence-modifying agents. The methods of theinvention contemplate generation of an allelic series of modificationsin more than one gene in the genome of a target cell, and preferably insubstantially every gene within the genome of the target cell. Theallelic series of modifications in any gene of interest may be analyzedin the manner described herein.

The methods of the invention also provide a set of cells which containone or more modifications in substantially every gene within the genomeof the cells (i.e., the “Library”). These methods allow screening theLibrary for modifications in a gene of interest prior to further usingcells from the Library for determining the function of the gene ofinterest. The methods provided herein are particularly useful inmodifying genes of interest in mammalian ES cells.

The methods of the invention involve treating a population of cells withone or more agents (e.g., nucleic acid sequence-modifying agents) whichare capable of introducing one or more modifications in substantiallyevery gene within the genome of the cells. The term “substantially everygene” refers to the statistical probability, preferably at least about70% probability, more preferably at least about 85% probability, andmost preferably at least about 95% probability, as determined by astandard Poisson distribution, that each gene in the genome contains atleast one modification. The resulting mutant cells which contain thegenomic modifications are clonally expanded (e.g., into 500,000 clones)to produce a library of clones, each clone containing at least onemodification in at least one gene of interest. Each clone is expandedand screened in order to determine the DNA sequence of the modified geneof interest as well as to determine the type and location of themodification in the gene of interest. DNA sequencing may be performed bysequence analysis (preferably using automated sample processing withrobotics technologies and molecular analysis using DNA chiptechnologies), or high throughput methods (e.g., single strandedconformational polymorphism (SSCP), chemical cleavage, and heteroduplexanalysis). The nucleic acid sequences which are obtained for the one ormore genes of interest from each clone provide a database by which eachclone is uniquely identified. Thus, modified cells may be selected fromthe Library based on cross reference to the nucleic acid sequence data.

Selected modified cells (i.e., cells containing one or moremodifications in the gene of interest) may further be used to determinethe function of the gene of interest. This may be accomplished by, forexample, culturing the selected modified cells and determining changesin the modified cells' morphological, biochemical, and molecularbiological characteristics as compared to those characteristics inwild-type cells (i.e., cells which had not been treated with the nucleicacid sequence-modifying agent). Alternatively, where the selectedmodified cells are capable of regenerating a multicellular organism(e.g., ES cells), these cells may be used to generate transgenicnon-human organisms which are further investigated to determinemorphological, biochemical, behavioral, histological, and molecularbiological changes relative to control non-human organisms, i.e.,non-human organisms which are generated from progenitor cells that hadnot been exposed to the nucleic nucleic acid sequence-modifying agent.

The modified cells which collectively contain an allelic series ofmodifications in a gene of interest, and which are generated by themethods disclosed herein, are useful in determining the function of thegene of interest. The usefulness of generating an allelic series ofmodifications in a gene of interest for the purpose of understanding thefunction of a gene is illustrated by the mouse agouti gene. An allelicseries of mutations in the agouti gene was obtained as a result ofseveral spontaneous mutations coupled with a large number of mutationsthat arose by chemical- and radiation-mutagenesis of the agouti gene atthe Oak Ridge National Laboratory. This allelic series of agouti genemutations was used to study the function of the agouti gene. The mostrecessive of these mutations, referred to as nonagouti (a), caused acompletely black coat color, while the various dominant alleles, likeviable yellow (A^(vy)), were neomorphs and were ali associated with ayellow coat pigmentation. The dominant neomorphic alleles also causedthe animals to develop a complex phenotype involving obesity,non-insulin dependent diabetes and other traits. Analysis of both thedominant and recessive alleles ultimately led to a better understandingof the function of the agouti gene as it relates to its ability toantagonize the melanocortin receptor.

The modified cells which collectively harbor an allelic series ofmodifications in substantially every gene are particularly useful ininvestigating diseases which are associated with more than onemodification in a given gene. Several such diseases are known in the artincluding, for example, epithelial ovarian cancer, sporadic breastcancer, familial breast cancer, cystic fibrosis, and autosomal dominantpolycystic kidney disease. For example, epithelial ovarian cancer hasbeen associated with 45 mutations in exons 5-8 of the p53 gene. Overall,72% of the mutations were transitions, 24% were transversions, and 4%were microdeletions. Allelic deletion of the other p53 allele was seenin 67% of ovarian cancers in which a p53 mutation was present [Kohler etal. (1993) J. Natl. Cancer Inst. 85(18):1513-1519]. Similarly, familialbreast cancer has also been shown to be associated with over 200distinct mutations in the BRCAI gene, including missense andprotein-truncating mutations [Greenman et al. (1998) 21(3):244-249].Cystic fibrosis was found to be associated with over 550 mutations inthe cystic fibrosis transmembrane conductance regulator (CFTR) gene[see, e.g., [Zielenski and Tsui (1995) Ann. Rev. Genetics 29:777-807;Dean and Santis (1994) Hum. Genet. 93(4):364-368]. A list of themutations associated with cystic fibrosis is available athttp://www.genet.sickkids.on.ca/cftr. Another disease associated withseveral mutations in a given gene is autosomal dominant polycystickidney disease (ADPKD) in which phenotypically indistinguishable traitsare caused by mutations in at least three distinct autosomal genes,i.e., PKD1, PKD2 and PKD3 [Sessa et al. (1997) J. Nephrol.10(6):295-310; Watnick et al. (1997) Hum. Molec. Genetics 6:1473-1481;Veldhuisen et al. (1997) 61:547-555].

The methods of the invention provide several advantages over prior artmethods for determining gene function. For example, unlike the priorart's retroviral insertional mutagenesis approach [Zambrowicz et al.(1998) Nature 392:608-611], the methods provided herein are not limitedto the identification of genes into which a retroviral sequence iscapable of inserting. Instead, the methods provided herein exploit anapproach in which different types of modifications of a gene of interestare randomly introduced into any part of that gene.

In particular, prior art methods which rely on insertion of a retroviralsequence into a gene suffer from the drawback that they result in onlyone class of mutations, i.e., insertions. Since many human diseases(e.g., cystic fibrosis and epithelial ovarian cancer) involvemodifications other than only insertions in a gene (e.g., singlenucleotide changes such as transitions, transversions, and deletions),prior art approaches which rely on retroviral insertional mutagenesisfail to generate major classes of mutations which are relevant to humandiseases.

Moreover, the methods provided herein permit evaluation of the functionof a gene of interest in a more rapid and more cost effective mannerthan prior art methods which rely on investigating gene function throughmutagenesis in whole animals. This is because the methods of theinvention allow a preliminary screening of treated cultured cells forgenomic modifications, and selecting only those clones which containmodifications in the gene of interest. Since screening and selection areperformed using treated cultured cells rather than whole animals,thousands of treated cultured cells may rapidly be analyzed for genomicmodifications using automated sample processing with roboticstechnologies and molecular analysis using DNA chip technologies.Furthermore, since treated cells may be recovered followingcryopreservation, the cost of maintaining a cell line containing amodification in a gene of interest is substantially less than the costof maintaining a line of transgenic animals in which the gene ofinterest is modified.

Additionally, unlike the phenotype-based screens of the prior art, themethods provided herein are not limited to only the identification ofgenes whose function is associated with expression of a phenotype in thenon-human organism which is generated from the modified cell. Rather,because the methods of the invention preferably employ nucleic acidsequence-based screens instead of only phenotype-based screens, themethods provided herein allow selection of modified cells in which themodified gene of interest contains a modification which producesamorphic (recessive null) alleles, hypomorphic recessive alleles thatexpress the gene with a reduced efficiency, hypermorphic recessivealleles that express the gene with greater than wild-type activity, aswell as antimorphic (dominant negative) and neomorphic (gain offunction) alleles.

The invention is further described under (1) Construction Of An AllelicSeries Of Modifications In A Library Of Cell Clones Containing GenomicModifications Using Nucleic Acid Sequence-Modifying Agents, (2)Accessing Clones In the Library, and (3) Determining Gene Function.

1. Construction Of An Allelic Series Of Modifications In A Library OfCell Clones Containing Genomic Modifications Using Nucleic AcidSequence-Modifying Agents

The methods of the invention are contemplated to involve treatingcultured cells form any. organism with one or more agents which arecapable of introducing at least one modification into genomic sequencesincluding, but not limited to, structural genes, regulatory genes, andregulatory elements. Treatment with the nucleic acid sequence-modifyingagent is contemplated to produce the Library, i.e., a set of cells whichcollectively contain one or more modifications in substantially everygene within the genome of the cells. The methods of the invention areherein illustrated by, but not limited to, treatment of mouse embryonicstem cells with N-ethyl-N-nitrosureaN-methyl-N′-nitro-N-nitrosoguanidine, or methyl methane sulfonate.

A. Organisms

The methods of the invention are not intended to be limited to the typeof organism from which the cells to be treated with the nucleic acidsequence-modifying agent are derived. Rather, any organism in which thefunction of a genomic sequence is sought to be determined iscontemplated to be within the scope of the invention. Such organismsinclude, but are not restricted to, non-human animals (e.g.,vertebrates, invertebrates, mammals, fish, insects, etc.), plants (e.g.,monocotyledon, dicotyledon, vascular, non-vascular, seedless, seedplants, etc.), protists (e.g., algae, citliates, diatoms, etc.), fungi(including multicellular forms and the single-celled yeasts), bacteria(prokaryotic, eukaryotic, archaebacteria, etc.), and viruses. In apreferred embodiment the organism is a non-human animal. A “non-humananimal” refers to any animal which is not a human and includesvertebrates such as rodents, non-human primates, ovines, bovines,ruminants. lagomorphs, porcines, caprines, equines, canines, felines,aves, etc. Preferred non-human animals are selected from the orderRodentia.

In yet a more preferred embodiment, the non-human animal is a mouse. Themouse offers several advantages with respect to modeling human disease.For example, most genes in humans have functional homologues in themouse. In addition, because the anatomy and physiology of the mouse aresimilar to those of humans, the phenotype of genetic diseases are verysimilar in the two organisms. Importantly, the mouse's small size, highreproductive capacity, extensive history of classical and moleculargenetic analyses, and ease of genetic manipulation make it a preferredorganism for studying the roles of genes in human disease.

In an alternative preferred embodiment, the non-human animal is azebrafish. Zebrafish are a preferred model for the determination of DNAfunction in mammals for several reasons. First, zebrafish is a complexvertebrate species which contains a majority of those genes found inhigher vertebrates such as man. Moreover, nucleotide sequences arehighly conserved between analogous zebrafish and mammalian genes[Schulte-Merker et al (1992) Development 116:1021-1032; Hermann et al(1990) Nature 343:617-622; Smith et al(1990) Cell 67:79-87; Blum et al(1992) Cell 69:1097-1106; Izpisua-Belmonte et al (1993) Cell 76:645-659;Blumberg et al (1991) Science 253:194-196; Stachel et al (1993)Development 117:1261-1274].

B. Cells

Any type of cell capable of being cultured is expressly included withinthe scope of this invention. The term “cell capable of being cultured”as used herein refers to a cell which is able to divide in vitro toproduce two or more progeny cells. Such cells are exemplified byembryonic cells (e.g., embryonic stem cells, fertilized egg cells,2-cell embryos, protocorm-like body cells, callus cells, etc.), adultcells (e.g., brain cells, fruit cells, etc.), undifferentiated cells(e.g., fetal cells, tumor cells, etc.), differentiated cells (e.g., skincells, liver cell, etc.), and the like.

In a preferred embodiment, the cell is capable of regenerating amulticellular organism. The use of such cells for the determination ofgene function is preferred since multicellular organisms provide aliving in vivo system in which the effects of modifying a gene ofinterest much more closely resemble those in a living organism ascompared to in vitro cultured cells. In a more preferred embodiment, thecells are capable of contributing to the germline of the regeneratedmulticellular organism.

In a particularly preferred embodiment, the cell which is treated withthe nucleic acid sequence-modifying agent and which is used toregenerate a multicellular organism is an embryonic stem (ES) cell. EScells are pluripotent cells directly derived from the inner cell mass ofblastocysts [Evans et al., (1981) Nature 292:154-156; Martin (1981)Proc. Natl. Acad Sci. USA 78:7634-7638; Magnuson et al., (1982) J.Embryo. Exp. Morph. 81:211-217; Doetchman et al., (1988) Dev. Biol.127:224-227], from inner cell masses [Tokunaga et al., (1989) Jpn. J.Anim. Reprod. 35:113-178], from disaggregated morulae [Eistetter, (1989)Dev. Gro. Differ. 31:275-282] or from primordial germ cells [Matsui etal., (1992) Cell 70:841-847; Resnick et al., (1992) Nature 359:550-551].These cells give rise to the endodermal, ectodermal, and mesodermalcompartments [Doetschman et al. (1985) J. Embryol. Exp. Morphol. 87:27].

Embryonic stem cells are preferred for determining the function of agene since they offer a number of advantages. For example, ES cells arecapable of forming permanent cell lines in vitro, thus providing anunlimited source of genetic material. Importantly, because the geneticmaterial of ES cells may be introduced into the germline of aregenerated multicellular organism, ES cell genes which are modified bythe methods disclosed herein may be introduced into the germline ofregenerated multicellular organisms thus offering the opportunity todetermine the function of the genes in the organism.

In a particularly preferred embodiment, the ES cells are mouse ES cells.Mouse ES cells are available from the ATCC and can be cultured over atleast sixty passages and typically retain a normal karyotype. Embryonicstem cells have been shown to remain in undifferentiated form in vitroif maintained on embryonic fibroblast feeder cell layers. In cellsuspension, they will begin differentiation, containing elements ofglandular, heart, skeletal smooth muscle, nerve, keratin-producingcells, and melanocytes [Doetschman (1988) et al. Dev. Biol.127:224-227].

An additional advantage to using ES cells is that they are the mostpluripotent cultured animal cells known. For example, when ES cells areinjected into an intact blastocyst cavity or under the zona pellucida,at the morula stage embryo, ES cells are capable of contributing to allsomatic tissues including the germ line in the resulting chimeras[reviewed by Bradley (1990) Curr. Op. Cell. Biol. 2:1013-1017; see alsoLallemand et al. (1990) Development 110:1241-1248; Bradley et al. (1984)Nature 309:255-256; Gissker et al. (1986) Proc. Natl. Acad. Sci. USA83:9065-9069; Robertson et al. (1986) Nature 323:445-448]. Indeed, theirability to colonize the various embryonic tissues is not equal. They areable to extensively colonize fetal tissues and extraembryonic mesoderm,but may be restricted in their capability to contribute to trophectodermand primitive endoderm derivatives [Beddington et al. (1989) Development105:733-737; Suemori et al. (1990) Cell Differ. Dev. 29:181-186].

Embryonic stem cell-like cells have been obtained from pig [Strojek etal. (1990) Theriogenology 33:901-914; Piedrahita et al. (1990)Theriogenology 34:879-901], sheep [Piedrahita et al. (1990)Theriogenology 34:879-901; Notarianni et al. (1991) J. Reprod. Fert.(Suppl.) 43:255-260], cattle [Saito et al. (1992) Roux's Arch. Dev.Biol. 201:134-141; Stice et al. (1996) Biol. Reprod. 54:100-110; Talbotet al. (1995) 42:35-52], American mink [Sukoyan et al. (1992) Mol.Reprod. Dev. 33:418-431], rat [Brenin et al. (1997) Transplant Proc.29:1761-1765; Iannaccone et al. (1994) Dev. Biol. 163:288-292], hamster[Doetschman et al. (1988) 127:224-227], and zebrafish [Sun et al. (1995)Mol. Mar. Biol. Biotechnol. 4:193-199]. In addition, the internationalapplication WO 90/03432 discloses pluripotential embryonic stemcell-like cells derived from porcine and bovine species. Thisinternational application describes the production of pluripotentialungulate embryonic stem cells, together with details of the morphologyenabling recognition of the cells.

Yet a further advantage to using ES cells to determine gene function isthat ES cells can be cultured and manipulated in vitro and then returnedto the embryonic environment to contribute to all tissues including thegerm line [for a review, see Robertson (1986) Trends in Genetics 2:9-13;Evans (1989) Mol. Bio. Med. 6:557-565; Johnson et al. (1989) Fetal Ther.4 (Suppl.) 1:28-39; Babinet et al. (1989) Genome 31:938-949]. Not onlycan embryonic stem cells propagated in vitro contribute efficiently tothe formation of chimeras, including germ line chimeras, but inaddition, these cells can be manipulated in vitro without losing theircapacity to generate germ line chimeras [Robertson et al. (1986) Nature323:445-447].

C. Nucleic Acid Sequence-Modifying Agents

The methods of the invention are contemplated to include within theirscope any agent which is capable of introducing a modification into thegenome of a cell. These agents are exemplified by chemicals andelectromagnetic radiation. Exemplary chemicals are described athttp://dir.niehs.nih.gov/dirtb/dirrtg/chemicalsstudiedindex2.htmincluding, but not limited to, N-ethyl-N-nitrosurea (ENU),methylnitrosourea (MNU), procarbazine hydrochloride (PRC), triethylenemelamine (TEM), acrylamide monomer (AA), chlorambucil (CHLI), melphalan(MLP), cyclophosphamide (CPP), diethyl sulfate (DES), ethyl methanesulfonate (EMS), methyl methanes ulfonate (MMS), 6-mercaptopurine (6MP),mitomycin-C (MMC), procarbazine (PRC),N-methyl-N′-nitro-N-nitrosoguanidine (MNNG), ³H₂O, and urethane (UR)[see, e.g., Russell et al., Factors affecting the nature of inducedmutations, In “Biology of Mammalian Germ Cell Mutagenesis,” BanburyReport 34, Cold Spring Harbor Laboratory Press (1990), pp. 271-289;Rinchik (1991) Trends in Genetics 7(1); Marker et al. (1997) Genetics145:435-443]. Electromagnetic radiation is exemplified by ultravioletlight, X-ray radiation, gamma-radiation, etc.

In one preferred embodiment, the nucleic acid sequence-modifying agentis N-ethyl-N-nitrosurea (ENU). ENU is a preferred compound forgenerating an allelic series of modifications in a cell's genome becauseit can produce point mutations (and, less frequently, small deletions)at sites throughout a gene, and is at least 10-fold more efficient ingenerating mutations than other agents [Russell et al., Proc. Natl.Acad. Sci. USA 76, 5818-5819, 1979; Hitotsumachi et al., PNAS 82,6619-6621;Shedlovsky et al., Genetics 134, 1205-1210; Marker et al.,Genetics 145, 435-443, 1997]. In addition, ENU has been shown to beeffective in inducing mutations in undifferentiated embryonal carcinoma(EC) cells [Schlmeyer and Wobus (1994) Mutation Res. 324:69-76], and indifferentiated cell lines such as Chinese hamster ovary (CHO), V79,mouse S49, and GRSLI3-2 cells, as well as human lymphoblasts andlymphocytes [Shibuya and Morimoto, Mutation Res. (1993) 297:3-38].Moreover, ENU has also been shown to produce antimorphic (dominantnegative) and neomorphic (gain of function) alleles [Justice et al.(1988) Genet. Res. 51:95-102; Vitatema et al. (1994) Science264:719-725]. Furthermore, based on data provided herein, it iscontemplated by the inventors that at least one point mutation may beintroduced into every gene by treating as few as from about 200 to about600 cells with ENU.

One of skill in the art appreciates that the frequency of modificationwith a given nucleic acid sequence-modifying agent may be manipulated bytreating the target cells with different doses of the agent and/or atmultiple times. One of skill in the art also appreciates that treatingcells with nucleic acid sequence-modifying agents involves using anon-toxic concentration (i.e., a concentration which does not kill allthe treated cells or which does not destroy the treated cells' abilityto regenerate a multicellular organism). Such a concentration mayempirically be determined and is within the ordinary skill in the art.For example, a 60-mm dish is seeded with cells and the cells areincubated in growth medium for about 18-24 hours prior to treatment withdifferent concentrations of the nucleic acid sequence-modifying agent.Treated cells are washed with phosphate buffered saline and furtherincubated in culture medium. An increase in the number of treated cellsfollowing incubation in culture medium as compared to the number ofcells at the time of treatment indicates that the concentration of thenucleic acid sequence-modifying agent is non-toxic. Also, the ability oftreated cells to regenerate multicellular organisms may readily bedetermined using methods known in the art which depend on the type oftreated cell used (described infra).

2. Accessing Clones In the Library

The Library which is generated following treatment of cells with thenucleic acid sequence-modifying agent provides a database of genomicsequences. The Library contains a pool of clones, each clone containinga unique set of genomic modifications.

The pool of clones in the Library may be arrayed as single clones. Oneof skill in the art appreciates that the number of clones may begoverned by the estimated number of genes in the genome of the treatedcells. For example, the genome of mouse ES cells contains approximately100,000 genes. Thus, approximately 500,000 individual clones from theLibrary may be arrayed using, for example, one thousand 96-wellmicrotiter dishes.

Cells from the arrayed Library may be replica plated and cells from oneset of replica plates frozen (i.e., to generate a master set) for futurerecovery of viable modified cells. Another set is screened for thepresence of the modified gene of interest using any one of a number ofmethods known in the art.

For example, the modification in the gene of interest may be determinedby amplification of the modified gene of interest by, for example, thepolymerase chain reaction using primers which are specific for themodified gene of interest, coupled with sequencing of the PCR-amplifiedsequence.

The term “amplifying” and its grammatical equivalents are defined as theproduction of additional copies of a nucleic acid sequence and isgenerally carried out using polymerase chain reaction technologies wellknown in the art [Dieffenbach C W and G S Dveksler (1995) PCR Primer, aLaboratory Manual, Cold Spring Harbor Press, Plainview N.Y.]. As usedherein, the term “polymerase chain reaction” (“PCR”) refers to themethod of K. B. Mullis disclosed in U.S. Pat. Nos. 4,683,195, 4,683,202and 4,965,188, all of which are hereby incorporated by reference, whichdescribe a method for increasing the concentration of a segment of atarget sequence in a mixture of genomic DNA without cloning orpurification. This process for amplifying the target sequence consistsof introducing a large excess of two oligonucleotide primers to the DNAmixture containing the desired target sequence, followed by a precisesequence of thermal cycling in the presence of a DNA polymerase. The twoprimers are complementary to their respective strands of the doublestranded target sequence. To effect amplification, the mixture isdenatured and the primers then annealed to their complementary sequenceswithin the target molecule. Following annealing, the primers areextended with a polymerase so as to form a new pair of complementarystrands. The steps of denaturation, primer annealing and polymeraseextension can be repeated many times (i.e., denaturation, annealing andextension constitute one “cycle”; there can be numerous “cycles”) toobtain a high concentration of an amplified segment of the desiredtarget sequence. The length of the amplified segment of the desiredtarget sequence is determined by the relative positions of the primerswith respect to each other, and therefore, this length is a controllableparameter. By virtue of the repeating aspect of the process, the methodis referred to as the “polymerase chain reaction” (hereinafter “PCR”).Because the desired amplified segments of the target sequence become thepredominant sequences (in terms of concentration) in the mixture, theyare said to be “PCR amplified.”

With PCR, it is possible to amplify a single copy of a specific targetsequence in genomic DNA to a level detectable by several differentmethodologies (e.g., hybridization with a labeled probe; incorporationof biotinylated primers followed by avidin-enzyme conjugate detection;and/or incorporation of ³²P-labeled deoxyribonucleotide triphosphates,such as dCTP or DATP, into the amplified segment). For example, the PCRamplified fragments may then be analyzed for the presence of mutationsin the gene of interest using a variety of techniques, e.g., nucleotidesequencing, SSCP, etc. [see, e.g., Greenman et al. (1998) Genes,Chromosomes & Cancer 21:244-249]. Briefly, The gene of interest isamplified by PCR using radioactive primers. PCR-amplified sequences areseparated by gel electrophoresis, and the gels are dried andautoradiographed. The patterns of migrating PCR-amplified bands frommodified cells and control cells (i.e., cells which had not been treatedwith the nucleic acid sequence-modifying agent) are compared.Differences in the patterns of migrating PCR-amplified bands indicatesthe presence of at least one modification in the PCR-amplified gene ofinterest.

Once a positive individual clone is identified, the type and location ofthe modification caused by the nucleic acid sequence-modifying agent inthe gene of interest is determined by sequencing the gene of interest inthe clone and comparing the sequence from the clone with the wild-typesequence of the gene of interest. Sequencing nucleic acid sequences iswithin the skill of the art and may be accomplished by, for example,using commercially available automated sequencers such as ABI 373 DNAsequencer using “GENESCAN 672” software.

3. Determining Gene Function

The allelic series of modifications in the genomic sequence of interestmay be used to determine the function of the genomic sequence by, forexample, determining the effect of the allelic modifications in culturedcells or in multicellular organisms which are generated to contain oneor more of the allelic modifications as described below.

A. Cultured Cells

Cells from the Library which contain at least one modification in thegene of interest may be used either directly or indirectly to determinethe function of the gene of interest. For example, a clone of cells fromthe Library may be directly cultured in order to determine the functionof the genomic sequence in that cell (e.g., by determining biochemical,molecular biological, and/or morphological changes in the culturedclone). This approach is most useful in evaluating the phenotype ofdominant mutations in the cells. The function of recessive mutations ina gene of interest may be determined by generating cells which arehomozygous or hemizygous with respect to the gene of interest usingmethods known in the art.

B. Regenerating Muticellular Organisms

In a preferred embodiment, where the cloned cell which contains themodified gene of interest is a cell (e.g., fertilized egg cell, ES cell,etc.) capable of regenerating a multicellular organism, the function ofthe gene may be evaluated by passing the modification through thegermline of a non-human organism generated from the cloned cell usingmethods known in the art.

For example, where the cell treated with the nucleic acidsequence-modifying agent is a fertilized egg cell of a mammal,transgenic mammals are generated by implanting the treated fertilizedegg cell into the uterus of a pseudopregnant female and allowing thecell to develop into an animal. This method has been successful inproducing transgenic mice, sheep, pigs, rabbits and cattle [Jaenisch(1988) supra; Hammer et al., (1986) J. Animal Sci.:63:269; Hammer etal., (1985) Nature 315:680-683; Wagner et al., (1984) Theriogenology21:29].

Additionally, where the fertilized egg cell which is treated with thenucleic acid sequence-modifying agent is derived from a fish (e.g.,zebrafish), transgenic zebrafish may be generated by allowing thefertilized egg cell to develop without the need for attention from itsparents. Development of a fully developed zebrafish from a fertilizedegg occurs over a 96 h period of time during which time the developingembryo is transparent, thus facilitating observation of its tissues andorgans. Following completion of embryogenesis over a 96 h period,sexually mature adult zebrafish may develop at the age of two-months,depending on their nutritional condition. The well characterizedzebrafish developmental periods and events occurring therein facilitatethe detection of alterations in these events by modifications to thegene of interest, thus allowing a determination of gene function.

Alternatively, where the cell treated with the nucleic acidsequence-modifying agent is a ES cell, multicellular organisms may begenerated by introducing the modified ES cell back into the embryonicenvironment for expression and subsequent transmission to progenyanimals. The most commonly used method is the injection of several EScells into the blastocoel cavity of intact blastocysts [Bradley et al.,(1984) Nature 309:225-256]. Alternatively, a clump of ES cells may besandwiched between two eight-cell embryos [Bradley et al., (1987) in“Teratocarcinomas and Embryonic Stem Cells: A Practical Approach,” Ed.Robertson E. J. (IRL, Oxford, U.K.), pp. 113-151; Nagy et al., (1990)Development 110:815-821]. Both methods result in germ line transmissionat high frequency.

While it is preferred that the multicellular organisms generated fromthe cells which are treated with the nucleic acid sequence-modifyingagent are transgenic organisms (i.e, organisms which contain a transgenein a germ-line cell), the invention also expressly contemplates chimericorganisms (i.e., organisms which contain a transgene in only somaticcells).

The regenerated animals, whether heterozygous or homozygous for amodification in a gene of interest, and whether they contain a modifiedgene of interest in a somatic and/or germline cell, may be used todetermine the function of the gene of interest. For example,morphological and pathological changes relative to wild-type animals maybe determined using methods known in the art such as by visualinspection, histological staining, electron microscopy, magneticresonance imaging (MRI), computerized tomography (CT) scans and thelike. Morphological changes as a result of the modification in the geneof interest indicate that the gene which is modified by the nucleic acidsequence-modifying agent is important in the formation of the structurewhose morphology is altered by the gene modification.

Alternatively, changes may be biochemical. Biochemical changes may bedetermined by, for example, changes in the activity of known enzymes, orin the rate of accumulation or utilization of certain substrates, etc.Such changes in response to modification of the gene suggest that thegene product acts in the same pathway as the enzymes whose activity isaltered, or in a related pathway which either supplies substrate tothese pathways, or utilizes products generated by them.

Yet another alternative is the determination of behavioral changes in anorganism. Where the organism is unicellular e.g. yeast cell, orbacterium, such changes may include light tropism, chemical tropism andthe like, and would suggest that the gene of interest regulates theseevents. Where behavioral changes are observed in a multicellularorganism, e.g., loss of spatial memory, aggressiveness, etc., suchchanges indicate that the gene of interest functions in a neural pathwayinvolved in controlling such behavior.

Other changes include molecular biological changes, e.g. in the levelsof expression of genes as determined by, for example, subtractionhybridization. Such changes suggest that the gene which is modified bythe nucleic acid sequence-modifying agent encodes a transcriptionalregulatory molecule such as a transcription factor.

EXPERIMENTAL

The following examples serve to illustrate certain preferred embodimentsand aspects of the present invention and are not to be construed aslimiting the scope thereof.

Example 1 N-ethyl-N-nitrosurea-Induced Mutations In the HypoxanthinePhosphoribosyl Transferase (Hprt) Gene Of Mouse Embryonic Stem Cells

This experiment was carried out in order to determine whetherN-ethyl-N-nitrosurea (ENU) induces mutations in the hypoxanthine guaninephosphoribosyltransferase (Hprt) gene in mouse embryonic stem (ES) cellswhich result in inactivation of the encoded enzyme. The rates ofspontaneous mutations and of mutations caused by treatment of the EScells with different concentrations of ENU were compared as follows.

Mouse ES cells (129/Sv+^(Tyr), +^(P)) were used. Cells were grown inculture medium [MEM (Gibco) supplemented with 15% heat-inactivated fetalcalf serum, LIF, and 10 μM β-mercaptoethanol. Cells were routinely grownon 100-mm petri dishes and cultivated at 37° C. in a humidifiedatmosphere of 5% CO₂ in air. For subculturing, cells were disassociatedwith HBSS containing 0.25% trypsin and 0.02% EDTA. After a 3-minuteincubation at room temperature, cells were resuspended in culture mediumand cell numbers determined using a hemocytometer.

The plating efficiency was determined by plating 2×10³ cells per 100-mmdish, fixing the cells after 6 days with methanol, staining with 0.01%crystal violet and scoring colonies. Colonies were scored for thecalculation of plating efficiency as percent of the inoculated cellnumber. The plating efficiencies were approximately 20%.

The spontaneous frequency of mutations which inactivated Hprt wasdetermined by trypsinizing actively growing ES cells (i.e., 3 daysfollowing subculture) as described above, seeding duplicate 100-mm petridishes with 1×10⁶ cells, and culturing the cells in culture mediumcontaining 10 μM 6-thioguanine (6-TG). The number of surviving coloniesafter 7 days of culture was one colony per plate, i.e., the spontaneousmutation frequency at the Hprt locus (taking into account platingefficiency) was 1/400,000 cells.

The frequency of ENU-induced mutations which inactivate Hprt in ES cellswas determined using two protocols as follows.

A. Protocol 1

In the first protocol, actively growing ES cells were trypsinized andplated at 5×10⁵ cells per T25 flask and after 1 day preincubationtreated for 5 hours with 0.3 mg/ml, 0.4 mg/ml or 0.5 mg/ml ENU(dissolved in medium without FCS by vigorous shaking and filtersterilization with 0.2 μm cellulose acetate filters immediately beforeuse). Control cells and cells which had been treated with ENU werewashed three times with PBS and the number of surviving cells determinedafter 3 days of culture in culture medium by plating 2,000 cells fromcontrol and from ENU-treated flasks and counting surviving colonies asdescribed above. Treatment with 0.3 mg/ml, 0.4 mg/ml and 0.5 mg/ml ENUresulted in 10%, less than 5% and 0% survival, respectively. SurvivingES cells which had been treated with 0.3 mg/ml ENU were subcultured at3:1 onto 60-mm petri dishes in the presence (set 1) and absence (set 2)of primary embryonic fibroblasts (PMEFs). ES cells grown on PMEFs(set 1) were cultured for 3 days, subcultured at 3:1 onto 60-mm petridishes without PMEFs, trypsinized, plated at 4×10⁵ cells per 100-mmplate without PMEFs and grown for 2 days. ES cells grown in the absenceof PMEFs (set 2) were trypsinized after 5 days in culture and plated at4×10⁵ cells per 100-mm plate without PMEFs. Cells (set 1 and set 2) weresubsequently grown for 1 day prior to the addition of 10 μM 6-TG to theculture medium and further incubated for an additional 6 days in thepresence of 6-TG. The number of 6-TG resistant colonies was calculated,and individual colonies were separately plated in a 96-well microtiterdish. The results are shown in Table 1. TABLE 1 Colonies Surviving 6-TGselection (Hprt deficient) No. Surviving No. Surviving Colonies ColoniesPlate No. (+PMEF plates) (−PMEF plates) 1 3 12 2 2 9 3 6 15 4 3 12 5 214 6 5 15 Total colonies 21 77 Total No. cells plated 24 × 10⁵ 24 × 10⁵Plating Efficiency (20%) 48 × 10⁴ 48 × 10⁴ ENU survival (10%) 48 × 10³48 × 10³ Mutation Frequency Minimum 1/8,000 1/8,000 Maximum 1/2,0001/623  The results in Table 1 show that ENU induces mutations in the Hprt geneof mouse ES cells at a frequency which is from 5-fold to 650-foldgreater than the rate of spontaneious mutation in the Hprt gene. Themutatin frequency could not be more accurately determined sinceENU-treated cells were trypsinized and replated before selection wasstarted, thus resulting in loss of clonality, and estimation of only arange of mutation frequency.

B. Protocol 2

In the second protocol, actively growing ES cells were trypsinized andplated at two densities (1×10⁵ and 0.5×10⁵ cells/well) in 24-well tissueculture plates instead of T25 flasks. After a one-day preincubationperiod, cells were treated for 5 hours with 0.3 mg/ml or 0.4 mg/ml ENU(as described above) and subsequently treated as described in Table 2.TABLE 2 Protocol for Treating ES Cells with ENU 0.3 mg/ml ENU 0.4 mg/mlENU Days after plating Plate 1 Plate 2 Plate 3 Plate 4 5 Split 1:3.Passed one- Discarded, third to new well, very few discarded remainingcolonies two-thirds. surviving 6 split 1:3 7 split 1:2 9 Platedone-third of Plated half of each each well onto well onto separateseparate 60-mm 60-mm dishes dishes 10 Added 6-thioguanine Added6-thioguanine Plated half of each well onto separate 60-mm dishes 16 9plates yielded 6 plates yielded colonies, colonies, 15 plates had no 18plates had no colonies colonies No. colonies/plate No. colonies/platePlate 1: 1 colony Plate 1: 2 colonies Plate 2: 3 colonies Plate 2: 1colony Plate 3: 5 colonies Plate 3: 2 colonies Plate 4: 4 colonies Plate4: 1 colony Plate 5: 1 colony Plate 5: 4 colonies Plate 6: 1 colonyPlate 6: 2 colonies Plate 7: 2 colonies Plate 8: 2 colonies Plate 9: 4colonies Total: 23 colonies Total: 12 colonies 17 2 plates yieldedcolonies, 22 plates had no colonies No. colonies/plate Plate 1: 14colonies Plate 2: 18 colonies Total: 22 colonies Total No. cells plated2.4 × 10⁶ 1.2 × 10⁶ NA 1.2 × 10⁶ Plating efficiency (20%) 4.8 × 10⁵ 2.4× 10⁵ 2.4 × 10⁵ ENU survival (10%) 4.8 × 10⁴ 2.4 × 10⁴ 2.4 × 10⁴Mutation frequency minimum 1/5,000 1/3,000 1/9,000 maximum 1/2,0001/1,500 1/8001By plating the cells into 24 well plates prior to ENU treatment, moreindividual populations were tested yielding approximately the samemutation frequency as in the first protocol (i.e., from about 1/600 toabout 1/9,000).

These results establish that ENU induces mutations in the Hprt gene at afrequency of from about 1/600 to about 1/9,000. The frequency ofmutaions induced by ENU at somatic genes is expected to be higher thanthe experimentally determined frequency of 1/600 since the aboveexperimental design selects for cells which are deficient in HPRTactivity, thus failing to detect Hprt mutations that result in little orno change in enzyme activity. Furthermore, the frequency ofchemically-induced mutaions at the Hprt gene have been found inmammalian cell specific-locus mutation assays to be lower than thefrequency at other genes, e.g., the tk+/−.

Example 2 Generating An Allelic Series Of Mutations In The PKD2 Gene ofMouse Embryonic Stem Cells Using Methyl Methane Sulfonate

This Example describes the generation of a Library of mouse ES cellswhich contain an allelic series of mutations generated by treatment withmethyl methane sulfonate and the screening of the PKD2 gene in theLibrary.

1. Preparing a Library of MMS-treated mouse ES cells Actively growingmouse ES cells (129/Sv+^(Tyr),+^(p)) which are grown and subcultured asdescribed above (Example 1) are plated at two densities (1×10⁵ and0.5×10⁵ cells/well) in 24-well tissue culture plates. After 1 day ofpreincubation in culture medium, cells are treated for one hour with 0.5mM to 1.5 mM MMS. Surviving cells are counted and individual clones arepicked and plated in 96-well microtiter dishes.

2. Screening The Library for PKD2 Alleles

Genomic DNA is isolated from MMS-treated and control (i.e., receiving noMMS) mouse ES cells using methods known in the art and the isolatedgenomic DNA is screened for mutations using PCR in combination with SSCPas previously described [Veldhuisen et al. (1997)]. Briefly,approximately 30 ng genomic DNA is amplified in a total volume of 15 μlby using a primer pair which is selected to amplify exon 1 [forwardprimer: (SEQ ID NO:1) 5′-AGGGAGGTGGAAGGGGAAGAA-3′; reverse primer: (SEQID NO:2) 5′-TTCTGGTTCGTGCATCTGCC-3′] of the PKD2 gene (expected productsize is 335 bp), in the presence of 0.2 mM each of dGTP, dATP, and dTTPand 0.025 mM α-³²P-dCTP, 0.06 Units Supertruper polymerase (HTBiotechnology) in PCR buffer (0.1 M Tris-HCl pH 9.0, 0.5 M KCI, 0.1%gelatin, 1.5 MM MgCl₂, and 1% Triton X-100. Denaturation is for 2 min at94° C., followed by 30 cycles of 1 min at 94° C, 2 min at 63° C., and 1min at 72° C. and then a final extension for 9 min at 72° C. The PCRproducts are diluted 1:5 in SSCP loading buffer (47.5% formamide, 15 mMEDTA, 0.05% SDS, 0.05% xylene cyanole, and 0.05% brornophenol blue) andare denatured at 95° C. The PCR-amplified products are separated on 5%nondenaturing polyacrylamide gels with or without 10% glycerol at eitherroom temperature or 4° C. Gels are exposed to Kodak XAR5 films. Thepattern of radioactive bands is compared between MMS-treated and controlembryos. Differences in the pattern of PCR-amplified sequences obtainedfrom control and MMS-treated cells indicate the presence of mutations inthe PCR-amplified sequences. The type and location of these mutations isdetermined by DNA sequencing.

3. DNA Sequencing

Sequencing is performed with the Automated Laser Fluorescent DNAsequence (ALF Pharmacia). Products for the ALF are obtained byamplification of genomic DNA by use of the forward and reverse primersdescribed supra, with an M13 extension (SEQ ID NO:3)5′-CGACGTTGTAAAACGACGGCCAGT-3′ at the 5′ end of the forward primer andwith a biotin label at the 5′ end of the reverse primer. The PCRproducts are purified by use of an Easyprep kit (Pharmacia).Single-stranded fragments of biotinylated PCR products are obtainedusing magnetic beads (Dynabeads). The sequence reaction is performedwith a fluorescent universal or reverse primer of the autoread kit(Pharmacia).

Example 3 Generating An Allelic Series Of Mutations In The BRCA1 Gene ofMouse Embryonic Stem Cells Using N-methyl-N′-nitro-N-nitrosoguanidine

This Example describes the generation of a Library of mouse ES cellswhich contain an allelic series of mutations generated by treatment withN-methyl-N′-nitro-N-nitrosoguanidine (MNNG), and the screening of theBRCA1 gene in the Library.

1. Preparing a Library of MNNG-treated mouse ES cells

Actively growing mouse ES cells (129/Sv+^(Tyr), +^(p)) which are grownand subcultured as described above (Example 1) are plated at twodensities (1×10⁵ and 0.5×10⁵ cells/well) in 24-well tissue cultureplates. After 2 days of preincubation in culture medium, cells aretreated for 0.75 hours with 1-10 μM MNNG. Surviving cells are countedafter 2 days in culture. Individual clones are picked and plated in96-well microtiter dishes.

2. Screening And Sequencing The Library For BRCA1 Alleles

Genomic DNA is isolated from MNNG-treated cells and control cells (i.e.,cells which are not treated with MNNG) using methods known in the art.Isolated genomic DNA is screened for mutations using PCR in combinationwith SSCP as previously described [Greenman et al. (1998)]. Briefly,primer sequences for exon amplification from genomic DNA are obtainedform the ftp file at morgan.med.utal.edu [Miki et al. (1994) Science266:66-71] except for primers for exons 6 and 7 [Friedman et al. (1994)Nat. Genet. 8:399-404]. Reverse primers are biotinylated at the 5′ endfor solid-phase sequencing. Exons 2-3, 5-10 and 12-24 are amplified byPCR, and PCR-amplified products are electrophoresed using two conditionsfor gel electrophresis since this has been reported to increase thesensitivity of mutation detection. The first condition uses 8%polyacrylamide, 0.16% bis-acrylamide, 5% glycerol, in 1×Tris-borate-EDTA (TBE) buffer. The second condition uses 0.5×MDE gels(JT Baker Inc., Phillipsburg, N.J.), 10% glycerol in 0.6×TBE buffer. The8% acrylamide and 0.5%×MDE gels are electrophoresed in 1×TBE and 0.6×TBEbuffer, respectively, at 6 to 9 W, for 16-20 hours at 4° C. Gels aredried under vacuum at 80° C. and autoradiographed at room temperaturefor one to four days. The pattern of PCR-amplified sequences fromcontrol and MNNG-treated cells is compared; differences in the patternindicate the presence of mutations in the PCR-amplified sequences. Thetype and location of these mutations is determined by DNA sequencing asdescribed supra (Example 2).

Alternatively, genomic DNA is screened for mutations using fluorescentchemical cleavage of mismatch (FCCM) since this method has been reportedto detect mutations in addition to those detected by SSCP [Greenman etal. (1998)]. To detect mutations in exon 11, exon 11 is amplified asthree overlapping PCR products (1322 bp, 1519 bp, and 1369 bp). Each ofthe six primers is synthesized to generate an unmodified set and a setwhich is biotinylated at the 5′ end. Genomic DNA is amplified withbiotinylated primers, and probed with the unmodified primers. Chemicalcleavage is carried out with hydroxylamine or osmium tetroxidemodification and piperidine cleave. The products are loaded on a 5%polyacrylamide/urea gel on the ABI 373 DNA sequence and electrophoresedat 40 W for 14 hours. Data is collected and analyzed using Genescan 672software. Differences between the sequence of PCR-amplified BRCAI whichis derived from MNNG-treated cells and from control cells are used todetermine the type and location of mutations introduced by MNNG.

From the above, it is clear that the invention provides methods fordetermining gene function which may efficiently be applied on agenome-wide scale, which generate more than one mutation in a gene ofinterest, and which do not only abrogate the function of the gene.

All publications and patents mentioned in the above specification areherein incorporated by reference. Various modifications and variationsof the described method and system of the invention will be apparent tothose skilled in the art without departing from the scope and spirit ofthe invention. Although the invention has been described in connectionwith specific preferred embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention which are obvious to those skilled in the artand related fields are intended to be within the scope of the followingclaims.

1. A method of producing an allelic series of modifications in a gene ofinterest in a cell, comprising: a) providing: i) an in vitro culturecomprising between 200 and 600 isolated tumor cells, each of said cellscomprising a gene of interest; ii) a chemical agent capable of producingat least one modification in said gene of interest; b) treating saidtumor cells with said chemical agent under conditions such that i) it isat least 70% probable that at least one modification in every gene insaid tumor cells is produced, and ii) a mixture of tumor cellscomprising said gene of interest is produced, said mixture of tumorcells comprising cells having a first modification in said gene ofinterest, and cells having a second modification in said gene ofinterest; and c) isolating said tumor cells having a first modificationin said gene of interest and said tumor cells having a secondmodification in said gene of interest, thereby producing an allelicseries of modifications in said gene of interest in the isolated tumorcells.
 2. The method of claim 1, wherein said treating is underconditions such that it is at least 85% probable that at least onemodification in every gene in said tumor cells is produced.
 3. Themethod of claim 1, wherein said treating is under conditions such thatit is at least 95% probable that at least one modification in every genein said tumor cells is produced.
 4. A method of producing an allelicseries of modifications in a gene of interest in a cell, comprising: a)providing: i) an in vitro culture comprising isolated tumor cells, eachof said cells comprising a gene of interest; ii) a chemical agentcapable of producing at least one modification in said gene of interest;b) treating said tumor cells with said chemical agent under conditionssuch that i) cell survival is between 5 and 10 percent, ii) it is atleast 70% probable that at least one modification in every gene in saidtumor cells is produced, and iii) a mixture of viable tumor cellscomprising said gene of interest is produced, said mixture of tumorcells comprising cells having a first modification in said gene ofinterest, and cells having a second modification in said gene ofinterest; and c) isolating said viable tumor cells having a firstmodification in said gene of interest and said viable tumor cells havinga second modification in said gene of interest, thereby producing anallelic series of modifications in said gene of interest in the isolatedtumor cells.
 5. The method of claim 4, wherein said treating is underconditions such that it is at least 85% probable that at least onemodification in every gene in said tumor cells is produced.
 6. Themethod of claim 4, wherein said treating is under conditions such thatit is at least 95% probable that at least one modification in every genein said tumor cells is produced.